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Abstract — In social settings, individuals interact through webs 
of relationships. Each individual is a node in a complex network 
(or graph) of interdependencies and generates data, lots of data. 
We label the data by its source, or formally stated, we index 
the data by the nodes of the graph. The resulting signals (data 
indexed by the nodes) are far removed from time or image 
signals indexed by well ordered time samples or pixels. DSP, 
discrete signal processing, provides a comprehensive, elegant, 
and efficient methodology to describe, represent, transform, 
analyze, process, or synthesize these well ordered time or image 
signals. This paper extends to signals on graphs DSP and its 
basic tenets, including filters, convolution, z-transform, impulse 
response, spectral representation, Fourier transform, frequency 
response, and illustrates DSP on graphs by classifying blogs, 
linear predicting and compressing data from irregularly located 
weather stations, or predicting behavior of customers of a mobile 
service provider. 

Keywords: Network science, signal processing, graphical 
models, Markov random fields, graph Fourier transform. 

L Introduction 

There is an explosion of interest in processing and analyzing 
large datasets collected in very different settings, including 
social and economic networks, information networks, internet 
and the world wide web, immunization and epidemiology 
networks, molecular and gene regulatory networks, citation 
and coauthorship studies, friendship networks, as well as 
physical infrastructure networks like sensor networks, power 
grids, transportation networks, and other networked critical 
infrastructures. We briefly overview some of the existing work. 

Many authors focus on the underlying relational structure 
of the data by: 1) inferring the structure from community 
relations and friendships, or from perceived alliances between 
agents as abstracted through game theoretic models H], ||2l; 
2) quantifying the connectedness of the world; and 3) de- 
termining the relevance of particular agents, or studying the 
strength of their interactions. Other authors are interested 
in the network function by quantifying the impact of the 
network structure on the diffusion of disease, spread of news 
and information, voting trends, imitation and social influence, 
crowd behavior, failure propagation, global behaviors devel- 
oping from seemingly random local interactions (T\, [T|, f4\. 
Much of these works either develop or assume network models 
that capture the interdependencies among the data and then 
analyze the structural properties of these networks. Models 
often considered may be deterministic like complete or regular 
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graphs, or random like the Erdos-Renyi and Poisson graphs, 
the configuration and expected degree models, small world or 
scale free networks 121, ID, to mention a few. These models 
are used to quantify network characteristics, such as connect- 
edness, existence and size of the giant component, distribution 
of component sizes, degree and clique distributions, and node 
or edge specific parameters including clustering coefficients, 
path length, diameter, betweenness and closeness centralities. 

Another body of literature is concerned with inference and 
learning from such large datasets. Much work falls under 
the generic label of graphical models [51, |l6l, Q, |[8|, |9|, 
[10|. In graphical models, data is viewed as a family of 
random variables indexed by the nodes of a graph, where 
the graph captures probabilistic dependencies among data 
elements. The random variables are described by a family of 
joint probability distributions. For example, directed (acyclic) 
graphs fm . lfT2l represent Bayesian networks where each 
random variable is independent of others given the variables 
defined on its parent nodes. Undirected graphical models, also 
referred to as Markov random fields [13|, |14|, describe data 
where the variables defined on two sets of nodes separated by 
a boundary set of nodes are statistically independent given the 
variables on the boundary set. A key tool in graphical models 
is the Hammersley-Clifford theorem |[l3l, HSl, |[l6l, and the 
Markov-Gibbs equivalence that, under appropriate positivity 
conditions, factors the joint distribution of the graphical model 
as a product of potentials defined on the cliques of the graph. 
Graphical models exploit this factorization and the structure 
of the indexing graph to develop efficient algorithms for 
inference by controlling their computational cost. Inference 
in graphical models is generally defined as finding from the 
joint distributions lower order marginal distributions, likeli- 
hoods, modes, and other moments of individual variables or 
their subsets. Common inference algorithms include belief 
propagation and its generalizations, as well as other message 
passing algorithms. A recent block-graph algorithm for fast 
approximate inference, in which the nodes are non-overlapping 
clusters of nodes from the original graph, is in |17|. Graphical 
models are employed in many areas; for sample applications, 
see flHl and references therein. 

Extensive work is dedicated to discovering efficient data 
representations for large high-dimensional data |fT9l . 1201 . 
fT\], f22]. Many of these works use spectral graph theory and 
the graph Laplacian ||23l to derive low-dimensional represen- 
tations by projecting the data on a low-dimensional subspace 
generated by a small subset of the Laplacian eigenbasis. The 
graph Laplacian approximates the Laplace-Beltrami operator 
on a compact manifold l24l . 1211 . in the sense that if the 
dataset is large and samples uniformly randomly a low- 
dimensional manifold then the (empirical) graph Laplacian 
acting on a smooth function on this manifold is a good discrete 
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approximation that converges pointwise and uniformly to the 
elliptic Laplace-Beltrami operator applied to this function as 
the number of points goes to infinity 1251 . ||26l . ||27 l. One 
can go beyond the choice of the graph Laplacian by choos- 
ing discrete approximations to other continuous operators 
and obtaining possibly more desirable spectral bases for the 
characterization of the geometry of the manifold underlying 
the data. For example, if the data represents a non-uniform 
sampling of a continuous manifold, a conjugate to an elliptic 
Schrodinger-type operator can be used ll28l . l|29l . Il30l . 

More in line with our paper, several works have proposed 
multiple transforms for data indexed by graphs. Examples in- 
clude regression algorithms ||3T1 . wavelet decompositions ll32l . 
ESI, f34l, r30l, f35l, filter banks on graphs f36l, f37l, de- 
noising |38|, and compression |39|. Some of these transforms 
focus on distributed processing of data from sensor fields 
while addressing sampling irregularities due to random sensor 
placement. Others consider localized processing of signals on 
graphs in multiresolution fashion by representing data using 
wavelet-like bases with varying "smoothness" or defining 
transforms based on node neighborhoods. In the latter case, 
the graph Laplacian and its eigenbasis are sometimes used 
to define a spectrum and a Fourier transform of a signal on a 
graph. This definition of a Fourier transform was also proposed 
for use in uncertainty analysis on graphs B^ . BTl . This graph 
Fourier transform is derived from the graph Laplacian and 
restricted to undirected graphs with real, non-negative edge 
weights, not extending to data indexed by directed graphs or 
graphs with negative or complex weights. 

The algebraic signal processing (ASP) theory B2l . B3l . 
P4l . 1451 is a formal, algebraic approach to analyze data 
indexed by special types of line graphs and lattices. The 
theory uses an algebraic representation of signals and filters 
as polynomials to derive fundamental signal processing con- 
cepts. This framework has been used for discovery of fast 
computational algorithms for discrete signal transforms B2l . 
||46l , PTTl . It was extended to multidimensional signals and 
nearest neighbor graphs ||481 , |49| and applied in signal 
compression ||50l , lISTl . The framework proposed in this paper 
generalizes and extends the ASP to signals on arbitrary graphs. 

Contribution 

Our goal is to develop a linear discrete signal processing 
(DSP) framework and corresponding tools for datasets arising 
from social, biological, and physical networks. DSP has been 
very successful in processing time signals (such as speech, 
communications, radar, or econometric time series), space- 
dependent signals (images and other multidimensional signals 
like seismic and hyperspectral data), and time-space signals 
(video). We refer to data indexed by nodes of a graph as 
a graph signal or sirnply signal and to our approach as 
DSP on graphs (DSPgo We introduce the basics of lineaid 

'The term "signal processing for graplis" lias been used in |52| . 1531 in 
reference to graph structure analysis and subgraph detection. It should not be 
confused with our proposed DSP framework, which aims at the analysis and 
processing of data indexed by the nodes of a graph. 

-We are concerned with linear operations; in the sequel, we refer only to 
DSPq but have in mind that we are restricted to linear DSPq . 



DSPg, including the notion of a shift on a graph, filter 
structure, filtering and convolution, signal and filter spaces 
and their algebraic structure, the graph Fourier transform, 
frequency, spectrum, spectral decomposition, and impulse and 
frequency responses. With respect to other works, ours is a 
deterministic framework to signal processing on graphs rather 
than a statistical approach Uke graphical models. Our work 
is an extension and generalization of the traditional DSP, 
and generahzes the ASP theory [42], gS, 144 1, [45 1 and its 
extensions and applications l49l , l50l . BTl . We emphasize 
the contrast between the DSPq and the approach to the graph 
Fourier transform that takes the graph Laplacian as a point of 
departure EH, lIMl, iMl, lES, ED, gT]. In the latter case, 
the Fourier transform on graphs is given by the eigenbasis of 
the graph Laplacian. However, this definition is not applicable 
to directed graphs, which often arise in real-world problems, 
as demonstrated by examples in Section IVII and graphs with 
negative weights. In general, the graph Laplacian is a second- 
order operator for signals on a graph, whereas an adjacency 
matrix is a first-order operator. Deriving a graph Fourier trans- 
form from the graph Laplacian is analogous in traditional DSP 
to restricting signals to be even (like correlation sequences) 
and Fourier transforms to represent power spectral densities 
of signals. Instead, we demonstrate that the graph Fourier 
transform is properly defined through the Jordan normal form 
and generalized eigenbasis of the adjacency matrijJI. Finally, 
we illustrate the DSPg with applications like classification, 
compression, and linear prediction for datasets that include 
blogs, customers of a mobile operator, or collected by a 
network of irregularly placed weather stations. 

II. Signals on Graphs 

Consider a dataset with N elements, for which some rela- 
tional information about its data elements is known. Examples 
include preferences of individuals in a social network and 
their friendship connections, the number of papers published 
by authors and their coauthorship relations, or topics of 
online documents in the World Wide Web and their hyperlink 
references. This information can be represented by a graph 
G = (V, A), where V — {fo, • • ■ , vm~i} is the set of nodes 
and A is the weig htecfl adjacency matrix of the graph. Each 
dataset element corresponds to node Vn, and each weight 
A„_„i of a directed edge from to Vn reflects the degree 
of relation of the nth element to the mth one. Since data 
elements can be related to each other differently, in general, 
G is a directed, weighted graph. Its edge weights A„.„j are not 
restricted to being nonnegative reals; they can take arbitrary 
real or complex values (for example, if data elements are 
negatively correlated). The set of indices of nodes connected 
to Vn is called the neighborhood of Vn and denoted by 

Un = {m I An,m ^ 0}. 

^ Parts of this material also appeared in 1541 . [551 . In this paper, we present 
a complete theory with all derivations and proofs. 

''Some literature defines the adjacency matrix A of a graph G = (V, A) 
so that An,m only takes values of or 1, depending on whether there is an 
edge from Vm to d„ , and specifies edge weights as a function on pairs of 
nodes. In this paper, we incorporate edge weights into A. 
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(a) Time series 



(b) Digital image 




(c) Sensor field (d) Hyperlinked documents 

Fig. 1. Graph representations for different datasets (graph signals.) 

Assuming, without a loss of generality, that dataset elements 
are complex scalars, we define a graph signal as a map from 
the set V of nodes into the set of complex numbers C: 

s : V ^ C, 

Vn ^ Sn- (1) 

Notice that each signal is isomorphic to a complex-valued 
vector with N elements. Hence, for simplicity of discussion, 
we write graph signals as vectors s = (so si ... sat-i) , 
but remember that each element s„ is indexed by node u„ of 
a given representation graph G — (V, A), as defined by ([T]i. 
The space S of graphs signals ([T]i then is identical to C^. 

We illustrate representation graphs with examples shown 
in Fig. [T] The directed cyclic graph in Fig. |l(a)| represents a 
finite, periodic discrete time series [44 1. All edges are directed 
and have the same weight 1, reflecting the causality of a time 
series; and the edge from vm-i to vq reflects its periodicity. 
The two-dimensional rectangular lattice in Fig. |l(b)| represents 
a general digital image. Each node corresponds to a pixel, and 
each pixel value (intensity) is related to the values of the four 
adjacent pixels. This relation is symmetric, hence all edges are 
undirected and have the same weight, with possible exceptions 
of boundary nodes that may have directed edges and/or dif- 
ferent edge weights, depending on boundary conditions ||451 . 
Other lattice models can be used for images as well ll48l . 
The graph in Fig. |l(c)| represents temperature measurements 
from 150 weather stations (sensors) across the United States. 
We represent the relations of temperature measurements by 
geodesic distances between sensors, so each node is connected 
to its closest neighbors. The graph in Fig. |l(d)| represents a 
set of 50 political blogs in the World Wide Web connected by 
hyperlink references. By their nature, the edges are directed 
and have the same weights. We discuss the two latter examples 
in Section |yl] where we also consider a network of customers 
of a mobile service provider Clearly, representation graphs 
depend on prior knowledge and assumptions about datasets. 
For example, the graph in Fig. |l(d)| is obtained by following 
the hyperlinks networking the blogs, while the graph in 
Fig. |l(c)| is constructed from known locations of sensors under 
assumption that temperature measurements at nearby sensors 



have highly correlated temperatures. 

III. Filters on Graphs 

In classical DSP, signals are processed by filters — systems 
that take a signal as input and produce another signal as output. 
We now develop the equivalent concept of graph filters for 
graph signals in DSPg- We consider only linear, shift-invariant 
filters, which are a generalization of linear time-invariant filters 
used in DSP for time series. This section uses Jordan normal 
form and characteristic and minimal polynomials of matrices; 
these concepts are reviewed in Appendix A. The use of Jordan 
decomposition is required since for many real-world datasets 
the adjacency matrix A is not diagonalizable. One example is 
the blog dataset, considered in Section IVII 

Graph Shift 

In classical DSP, the basic building block of filters is a 
special filter x = called the time shift or delay [56|. This 
is the simplest non-trivial filter that delays the input signal s 
by one sample, so that the n\h sample of the output is s„ = 
Sn-i mod N- Using the graph representation of finite, periodic 
time series in Fig. |l(a)[ for which the adjacency matrix is the 
N X N circulant mati'ix A = Cat, with weights gH, ll44l 



if n — m = 1 mod N 
otherwise 



(2) 



we can write the time shift operation as 



s = Cjvs = As. (3) 

In DSPg, we extend the notion of the shift ^ to general 
graph signals s where the relational dependencies among the 
data are represented by an arbitrary graph G = (V,A). We 
call the operation (O the graph shift. It is realized by replacing 
the sample s„ at node Vn with the weighted linear combination 
of the signal samples at its neighbors: 



Af-l 



5n — ^ ^ ^71,'. 



m— 



(4) 



Note that, in classical DSP, shifting a finite signal requires 
one to consider boundary conditions. In DSPq, this problem 
is implicitly resolved, since the graph G — (V, A) explicitly 
captured the boundary conditions. 

Graph Filters 

Similarly to traditional DSP, we can represent filtering 
on a graph using matrix-vector multiplication. Any system 
H G C^^^, or graph filter, that for input s G 5 produces 
output Hs represents a linear system, since the filter's output 
for a linear combination of input signals equals the linear 
combination of outputs to each signal: 

H(asi + /3s2) = aHsi + /3Hs2. 

Furthermore, we focus on shift-invariant graph filters, for 
which applying the graph shift to the output is equivalent to 
applying the graph shift to the input prior to filtering: 



A(Hs) = H(As) 



(5) 
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The next theorem establishes that all linear, shift-invariant 
graph filters are given by polynomials in the shift A. 

Theorem 1: Let A be the graph adjacency matrix and 
assume that its characteristic and minimal polynomials are 
equal: pa{x) — mA.{x). Then, a graph filter H is linear and 
shift invariant if and only if (iff) H is a polynomial in the 
graph shift A, i.e., iff there exists a polynomial 



h{x) — ho + hix + . . . + h^x^" 
with possibly complex coefficients hi G C, such that: 



H = h{A) = hoI+hiA 



hjA'- 



(6) 



(7) 



Proof: Since the shift-invariance condition (|5]i holds for 
all graph signals s G 5 = C^, the matrices A and H 
commute: AH = HA. As pa{x) = mA.{x), all eigenvalues 
of A have exactly one eigenvector associated with them, ||57]| , 
||58l. Then, the graph matrix H commutes with the shift A iff 
it is a polynomial in A (see Proposition 12.4.1 in [58 1). ■ 
Analogous to the classical DSP, we call the coefficients hi 
of the polynomial h{x) in (|6]l the graph filter taps. 

Properties of Graph Filters 

Theorem [T] requires the equality of the characteristic and 
minimal polynomials pa{x) and mA{x). This condition does 
not always hold, but can be successfully addressed through 
the concept of equivalent graph filters, as defined next. 

Definition 1: Given any shift matrices A and A, filters 
h{A) and .g(A) are called equivalent if for all input signals 
s G 5 they produce equal outputs: /i(A)s = g(A)s. 

Note that, when no restrictions are placed on the signals, 
so that S = C^, Definition [T] is equivalent to requiring 
/i(A) = g{A) as matrices. However, if additional restrictions 
exist, filters may not necessarily be equal as matrices and still 
produce the same output for the considered set of signals. 

It follows that, given an arbitrary G = (V, A) with pa{x) ^ 
■mA{x), we can consider another graph G = (V, A) with the 
same set of nodes V but potentially different edges and edge 
weights, for which pjj^x) = m^{x) holds true. Then graph 
filters on G can be expressed as equivalent filters on G, as 
described by the following theorem (proven in Appendix B). 

Theorem 2: For any matrix A there exists a matrix A and 
polynomial r{x), such that A ~ r{A) and p^(a;) = m^{x). 

As a consequence of Theorem IH any filter on the graph 
G — (V, A) is equivalent to a filter on the graph G — (V, A), 
since h{A) — h{r{A)) — {ho r)(A), where h o r is the 
composition of polynomials h and r and thus is a polynomial. 
Thus, the condition pA {x) ~ ttla (x) in Theorem [T] can be 
assumed to hold for any graph G — (V,A). Otherwise, by 
Theorem |2] we can replace the graph by another G — (V, A) 
for which the condition holds and assign A to A. 

The next result demonstrates that we can limit the number 
of taps in any graph filter. 

Theorem 3: Any graph filter (Q has a unique equivalent 
filter on the same graph with at most degniA{x) = Na taps. 

Proof: Consider the polynomials h{x) in (|6]). By polyno- 
mial division, there exist unique polynomials q{x) and r{x): 



h{x) — q{x)inA{x) + r{x), (8) 

where degr(x) < Na- Hence, we can express (|7]l as 

h{A) = q(A)TOA(A) + r(A) = q{A) On +r(A) = r(A). 

Thus, h{A) — r{A) and degr(x) < degmA(a;). ■ 
As follows from Theorem [3] all linear, shift-invariant fil- 
ters (|2| on a graph G — (V, A) form a vector space 



T= <U: H = 



E 



hpA' 



hieC 



(9) 



Moreover, addition and multiplication of filters in T produce 
new filters that are equivalent to filters in J^. Thus, J- is closed 
under these operations, and has the structure of an algebra 11431 . 
We discuss it in detail in Section HVl 

Another consequence of Theorem [3] is that the inverse of a 
filter on a graph, if it exists, is also a filter on the same graph, 
i.e., it is a polynomial in (|9]l. 

Theorem 4: A graph filter H = h{A) G is invertible 
iff polynomial h{x) satisfies h{Xm) ^ for all distinct 
eigenvalues Ao,...,Am-i, of A. Then, there is a unique 
polynomial g{x) of degree A<i%g(x) < Na that satisfies 

h{A)-^ = giA) e T. (10) 

Appendix C contains the proof and the procedure for the 
construction of g{x). Theorem |4] implies that instead of 
inverting the N x N matrix h{A) directly we only need to 
construct a polynomial g{x) specified by at most A^a taps. 

Finally, it follows from Theorem [3] and (|9]l that any 
graph filter h{A) G is completely specified by its taps 
ho, ■ ■ ■ , hNj^~i- As we prove next, in DSPg, as in traditional 
DSP, filter taps uniquely determine the impulse response of 
the filter, i.e., its output u = {go, ■ ■ ■ , gN-i) for unit impulse 
input S — {1,0, ... , 0)"^, and vice versa. 

Theorem 5: The filter taps ho, . . . , h^A-i of the filter h{A) 
uniquely determine its impulse response u. Conversely, the im- 
pulse response u uniquely determines the filter taps, provided 
rank A = Na, where A = (A"5, . . . , A^^^M) . 

Proof: The first part follows from the definition of filter- 
ing: u ~ h{A)5 — Ah yields the first column of h{A), which 
is uniquely determined by the taps h = {ho, . . . , h^j^-i)'^ . 
Since we assume pa{x) — mA{x), then N = Na, and the 
second part holds if A is invertible, i.e., rank A = A'a- ■ 

Notice that a relabeling of the nodes vo, ■ ■ ■ , vn-i does not 
change the impulse response. If P is the corresponding permu- 
tation matrix, then the unit impulse is PS, the adjacency matrix 
is PAP^, and the filter becomes h{PAP^) = Pft.(A)P'^. 
Hence, the impulse response is simply reordered according to 
same permutation: P/i(A)P"^P^ = Pu. 

IV. Algebraic Model 

So far, we presented signals and filters on graphs as vectors 
and matrices. An alternative representation exists for filters and 
signals as polynomials. We call this representation the graph 
z-transform, since, as we show, it generalizes the traditional 
z-transform for discrete time signals that maps signals and 
filters to polynomials or series in z~^. The graph z-transform 
is defined separately for graph filters and signals. 
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Consider a graph G — (V, A), for which the characteristic 
and minimal polynomials of the adjacency matrix coincide: 
Pa{x) — mA.{x). The mapping A M> a; of the adjacency 
matrix A to the indeterminate x maps the graph filters H = 
h{A) in T to polynomials h{x). By Theorem [3] the filter 
space J-" in (|9]l becomes a polynomial algebra P3]| 

A^C[x]/mAix). (11) 

This is a space of polynomials of degree less than degmA(2:) 
with complex coefficients that is closed under addition and 
multiplication of polynomials modulo mA(a;)- The mapping 
J- ^ A, h{A) 1-^ h{x), is a isomorphism of C-algebras |(43|, 
which we denote as ^ ^. We call it the graph z-transform 
of filters on graph G ~ (V, A). 

The signal space iS is a vector space that is also closed 
under filtering, i.e., under multiplication by graph filters from 
F: for any signal s G 5 and filter h{A), the output is a signal 
in the same space: h{A)s e S. Thus, S is an J^-module [43 il . 
As we show next, the graph z-transform of signals is defined 
as an isomorphism ( fTST l from S to an ^-module. 

Theorem 6: Under the above conditions, the signal space S 
is isomorphic to an yl-module 

M = C[x]/pAix) - is{x) - ^nbnix)] (12) 

I n=0 J 

under the mapping 

S = (so, . . . , SAT-l)^ i-^ s{x) = ^ Snbn{x). (13) 

ri=0 

The polynomials bo{x), . . . , &Ar_i(a;) are linearly independent 
polynomials of degree at most — 1. If we write 

h{x)^{boix),...,bN-iix)f , (14) 
then the polynomials satisfy 

(15) 

for < r < Rm.o and < m < AI, where A„i and Vm,o,r 
are generalized eigenvectors of A-^, and bn\Xm) denotes the 
rth derivative of bn{x) evaluated at a; = Xm- Filtering in M 
is performed as multiplication modulo pa{x)'- if s = h{A)s, 
then 

N-l 

s s{x) = Snbn{x) — h{x)s{x) uiod paIx). (16) 

Proof: Due to the linearity and shift-invariance of graph 
filters, we only need to prove ( fT6] l for h{A) = A. Let us write 
s{x) = b(a;)-'"s and s{x) ~ b(a;)-'"s, where h{x) is given 
by (O. Since (O must hold for all s eS, for h{A) = A it 
is equivalent to 

b(a;)^s b(a::)^ (As) = b(a::)^ (xs) mod PAix) 
^ {A^ -xl)h{x)=cpAix), (17) 

where c G is a vector of constants, since degpA(a;) = N 
and deg (xb^ix)) < N for <n<N. 



It follows from the factorization ( |43] ) of pA {x) that, for each 
eigenvalue A„i and < k < Am, the characteristic polynomial 
satisfies p^''(A,„) = 0. By taking derivatives of both sides 
of ( fTTb and evaluating at x — Xm, < m < AI, we construct 
Aq + . . . + Am-1 ~ N linear equations 

(A^-A„j)b(A„) = 
(A^-A„,l)bW(A„) = rh{Xm), l<r<Am 

Comparing these equations with ( |35] |. we obtain ( fTsl l. Since N 
polynomials bn{x) = bn,o + . . . + bn.N-ix'^^^ are character- 
ized by N'^ coefficients 6„_fe, < n, k < N, (fTSl l is a system 
of N linear equations with N"^ unknowns that can always be 
solved using inverse polynomial interpolation L581 . ■ 
Theorem |6] extends to the general case Pa{x) 7^ mA{x). 
By Theorem |2] there exists a graph G = (V, A) with 
Pa(^) ~ '^a(^)' ^^^^ '^h^t A = r{A). By mapping A to x, 
the filter space (|9]l has the structure of the polynomial algebra 
A — C[x]/mA{r{x)) = C[x]/{mA o 'r){x)) and the signal 
space has the structure of the ^-module A4 — C[x]/p^{x). 
Multiplication of filters and signals is performed modulo 
p^{x). The basis of Ai satisfies (fTsl l. where A„i and Vm,d,r 
are eigenvalues and generalized eigenvectors of A. 

V. Fourier Transform on Graphs 

After establishing the structure of filter and signal spaces in 
DSPg, we define other fundamental DSP concepts, including 
spectral decomposition, signal spectrum, Fourier transform, 
and frequency response. They are related to the Jordan normal 
form of the adjacency matrix A, reviewed in Appendix A. 

Spectral Decomposition 

In DSP, spectral decomposition refers to the identification 
of subspaces So,...,Sk-i of the signal space S that are 
invariant to filtering, so that, for any signal G Sk and filter 
h{A) G J-, the output = h{A)sk lies in the same subspace 
Sk- A signal s G 5 can then be represented as 

s = So + si + . . . + sk_i, (18) 

with each component s^ G Sk - Decomposition (fTSl l is uniquely 
determined for every signal s G 5 if and only if: 1) invariant 
subspaces Sk have zero intersection, i.e., Sk n Sm = {0} for 
k ^ m; 2) dimiSo + . . . + dimSK-i = dim 5 — N; and 
3) each Sk is irreducible, i.e., it cannot be decomposed into 
smaller invariant subspaces. In this case, S is written as a 
direct sum of vector subspaces 

S = So®Si(B.-.®Sk^i- (19) 

As mentioned, since the graph may have arbitrary struc- 
ture, the adjacency matrix A may not be diagonalizable; 
in fact, A for the blog dataset (see Section IVIb is not 
diagonalizable. Hence, we consider the Jordan decomposi- 
tion (|39] | A = VJV~^, which is reviewed in Appendix 
A. Here, J is the Jordan normal form ( |40| |. and V is 
the matrix of generalized eigenvectors (|38] |. Let Sm,d = 
span{v„_d^o, ■ • ■ , Vm,d,_R„,d-i} be a vector subspace of S 
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spanned by the dth Jordan chain of Am. Any signal Sm,d G 
Sm,d has a unique expansion 



S7n,d,R„,d-l) J 



where V.,„ ^ is the block of generalized eigenvectors ( [37] i. 
As follows from the Jordan decomposition ( [39l l. shifting the 
signal s„i produces the output s^.d G 5m, d from the same 
subspace, since 



-^Sm,d — ^^m.d \^m,d,0 
Vm,d Jfl,„,d(Am) (s 



(20) 



AmSm,ci,fl,„ d~2 + Sm,d,R„i_d 
\ ^mSm,d,Rrn,d~l 

Hence, each subspace Sm.d < 5 is invariant to shifting. 
Using ( [39l ) and Theorem [T] we write the graph filter O as 

L L 

/i(A) = ^/i£(VJV-i)'^ =^/ifVJ^V-i 

:V/t(J)V"^ (21) 



f=0 



Similarly to (|20] l. we observe that filtering a signal Sm,d £ 
'5m, d produces an output ?m,d S ^m,^ from the same subspace: 

Sm,d,0 

Sm,d = h{A)Sm.A = h{A) W rn,d 



\Sm,d,_R„,d-l/ 
^m,d.G 



(22) 



VST?i,d,_R„_d-l y 

Since all N generalized eigenvectors of A are linearly inde- 
pendent, all subspaces Sm,d have zero intersections, and their 
dimensions add to N . Thus, the spectral decomposition ( fT9] l 
of the signal space S is 



M-l D„ 



(23) 



m=0 d=0 



Graph Fourier Transform 

The spectral decomposition ( |23T l expands each signal s € S 
on the basis of the invariant subspaces of S. Since we chose 
the generalized eigenvectors as bases of the subspaces Sm,d, 
the expansion coefficients are given by 

s = V s, (24) 

where V is the generalized eigenvector matrix (|38] |. The vector 
of expansion coefficients is given by 

s = V"^s. (25) 

The union of the bases of all spectral components Sm,d, 
i.e., the basis of generalized eigenvectors, is called the graph 
Fourier basis. We call the expansion (l25l l of a signal s into the 



graph Fourier basis the graph Fourier transform and denote 
the graph Fourier transform matrix as 

r-l 



V 



(26) 



Following the conventions of classical DSP, we call the 
coefficients s„ in (|25l l the spectrum of a signal s. The inverse 
graph Fourier transform is given by ( l24] l: it reconstructs the 
signal from its spectrum. 

Frequency Response of Graph Filters 

The frequency response of a filter characterizes its effect on 
the frequency content of the input signal. Let us rewrite the 
filtering of s by h{A) using ( 1211 1 and (|24] | as 

s = h{A)s = F"^ h{J) F s = F"^ h{3)s 
=^ Fs = /i(J)s. (27) 

Hence, the spectrum of the output signal is the spectrum of 
the input signal modified by the block-diagonal matrix 

A(Jr.o,o(Ao)) \ 

h{3)=\ 

(28) 

so that the part of the spectrum corresponding to the invariant 
subspace Sm,d is multiplied by h{3„i)- Hence, h{J) in (|28] l 
represents the frequency response of the filter h{A). 

Notice that ( |27] i also generalizes the convolution theorem 
from classical DSP |56| to arbitrary graphs. 

Theorem 7: Filtering a signal is equivalent, in the frequency 
domain, to multiplying its spectrum by the frequency response 
of the filter. 

Discussion 

The connection (l25T l between the graph Fourier transform 
and the Jordan decomposition ( |39] l highlights some desirable 
properties of representation graphs. For graphs with diago- 
nalizable adjacency matrices A, which have N linearly in- 
dependent eigenvectors, the frequency response ( |28] l of filters 
h{A) reduces to a diagonal matrix with the main diagonal 
containing values /i(Am), where Am are the eigenvalues of 
A. Moreover, for these graphs. Theorem |6] provides the 
closed-form expression (flST l for the inverse graph Fourier 
transform F^^ = V. Graphs with symmetric (or Hermitian) 
matrices, such as undirected graphs, are always diagonalizable 
and, moreover, have orthogonal graph Fourier transforms: 
F^^ = F^. This property has significant practical importance, 
since it yields a closed-form expression (fTSl l for F and F^^. 
Moreover, orthogonal transforms are well-suited for efficient 
signal representation, as we demonstrate in Section |VT] 

DSPg is consistent with the classical DSP theory. As 
mentioned in Section |II] finite discrete periodic time series 
are represented by the directed graph in Fig. |l(a)| The corre- 
sponding adjacency matrix is the N x N circulant matrix (|2]i. 
Its eigendecomposition (and hence, Jordan decomposition) is 



Cm = — DFT 



N 



N 



DFT 



■(Jv-i) 
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where DFT^v is the discrete Fourier transform matrix. Thus, 
as expected, the graph Fourier transform is F = DFT^r. 
Furthermore, for a general filter /i(Cjv) = ^^^^n^ 
coefficients of the output s = /i(Cjv)s are calculated as 



l-h{A) 



(I -MA))-' 



= hnSo + . . . + hf)S.n + hN^lSn+1 + • ■ • + hn+lSN-1 
Skh[n-k mod N) ■ 



N-l 

E 

fc=0 



This is the standard circular convolution. Theorem |5] holds as 
well, with impulse response identical to filter taps: u = h. 

Similarly, it has been shown in B31 . B3l that unweighted 
line graphs similar to Fig. |l(a)| but with undirected edges and 
different, non-periodic boundary conditions, give rise to all 
16 types of discrete cosine and sine transforms as their graph 
Fourier transform matrices. Combined with |[59l , it can be 
shown that graph Fourier transforms for images on the lattice 
in Fig. |l(b)| are different types of two-dimensional discrete 
cosine and sine transforms, depending on boundary conditions. 
This result serves as additional motivation for the use of these 
transforms in image representation and coding ll60l . 

In discrete-time DSP, the concepts of filtering, spectrum, 
and Fourier transform have natural, physical interpretations. 
In DSPg, when instantiated for various datasets, the interpre- 
tation of these concepts may be drastically different and not 
immediately obvious. For example, if a representation graph 
reflects the proximity of sensors in some metric (such as 
time, space, or geodesic distance), and the dataset contains 
sensor measurements, then filtering corresponds to linear re- 
combination of related measurements and can be viewed as a 
graph form of regression analysis with constant coefficients. 
The graph Fourier transform then decomposes signals over 
equilibrium points of this regression. On the other hand, if a 
graph represents a social network of individuals and their com- 
munication patterns, and the signal is a social characteristic, 
such as an opinion or a preference, then filtering can be viewed 
as diffusion of information along established communication 
channels, and the graph Fourier transform characterizes signals 
in terms of stable, unchangeable opinions or preferences. 

VI. Applications 

We consider several applications of DSPg to data pro- 
cessing. These examples illustrate the effectiveness of the 
framework in standard DSP tasks, such as predictive filtering 
and efficient data representation, as well as demonstrate that 
the framework can tackle problems less common in DSP, such 
as data classification and customer behavior prediction. 

Linear Prediction 

Linear prediction (LP) is an efficient technique for repre- 
sentation, transmission, and generation of time series Il6l1 . 
It is used in many applications, including power spectral 
estimation and direction of arrival analysis. Two main steps of 
LP are the construction of a prediction filter and the generation 
of an (approximated) signal, implemented, respectively, with 
forward and backward filters, shown in Fig. |2] The forward 
(prediction) filter converts the signal into a residual, which is 



(a) Forward (prediction) filter (b) Bacltward (synthesis) filter 

Fig. 2. Components of linear prediction. 



then closely approximated, for example, by a white noise-flat 
power spectrum signal or efficient quantization with few bits. 
The backward (synthesis) filter constructs an approximation of 
the original signal from the approximated residual. 

Using the DSPg, we can extend LP to graph signals. 
We illustrate it with the dataset ||62| of daily temperature 
measurements from sensors located near 150 major US cities. 
Data from each sensor is a separate time series, but encoding 
it requires buffering measurements from multiple days before 
they can be encoded for storage or transmission. Instead, we 
build a LP filter on a graph to encode daily snapshots of all 
150 sensor measurements. 

We construct a representation graph G = (V, A) for the 
sensor measurements using geographical distances between 
sensors. Each sensor corresponds to a node t;„, < n < 150, 
and is connected to K nearest sensors with undirected edges 
weighted by the normalized inverse exponents of the squared 
distances: if dnm denotes the distance between the nth and 
TOth sensor^ and m G 7V„, then 



A — 



(29) 



For each snapshot s of = 150 measurements, we 
construct a prediction filter h{A) with L taps by minimizing 
the energy of the residual r = s — h{A)s — (Ijv ~h{A)) s. 
We set /iQ = to avoid the trivial solution h{A) = I, and 
obtain 

(hi ... hL-if = {B^By^B'^s. 

Here, B = (As . . . A^^^s) is a A^ x (L - 1) matrix. The 
residual energy | |r| I2 is relatively small compared to the energy 
of the signal s, since shifted signals are close approximations 
of s, as illustrated in Fig. [3] This phenomenon provides the 
intuition for the graph shift: if the graph represents a similarity 
relation, as in this example, then the shift replaces each signal 
sample with a sum of related samples with more similar 
samples weighted heavier than less similar ones. 

The residual r is then quantized using B bits, and the 
quantized residual f is processed with the inverse filter to 
synthesize an approximated signal s = {In —h{A))~^r. 

We considered graphs with 1 < K < 15 nearest neighbors, 
and for each K constructed optimal prediction filters with 2 < 
L < 10 taps. As shown in Fig.|4] the lowest and highest errors 
||s — s||2/||s||2 were obtained for K — 11 and L — 3, and for 
K ~ 8 and L = 9. During the experiments, we observed that 
graphs with few neighbors (approximately, 3 < if < 7) lead to 
lower errors when prediction filters have impulse responses of 

'The construction of representation graphs for datasets is an important 
research problem and deserves a separate discussion that is beyond the scope 
of this paper The procedure we use here is a popular choice for construction 
of similaiity graphs based on distances between nodes 1211 . 1301 . 1351 . 



8 



40 



30 



Signal 

Shifted signal 
Twice shifted signal 




60 75 90 
Sensor index 



105 120 135 150 



Fig. 3. A signal representing a snapshot of temperature measurements from 
A'^ = 150 sensors. Shifting the signal produces signals similar to the original. 
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Bits used for quantization 
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Fig. 5. Average reconstruction en'or ||s — s[|2/|ls||2 for the compression of 
365 daily temperature snapshots based on the graph Fourier transform using 
KC < N coefficients. 



: . .. • , 



. • 

• • *. 
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Fig. 6. The Fourier basis vector that captures most energy of temperature 
measurements reflects the relative distribution of temperature across the 
mainland United States. The coefficients are normalized to the interval [0, 1]. 



Fig. 4. Average approximation errors ||s — s^||2/||s||2 for LP coding of 365 
signals s representing daily temperature snapshots. Graphs with 1 < < 15 
nearest neighbors for each sensor were analyzed, and filters with 2 < L < 10 
taps were constructed. The residual was quantized using 1 < i? < 16 bits. 
The lowest, second lowest, and highest errors were obtained, respectively for 
K = 11 and L = 3, K = 10 and L = 3, and X = 8 and L = 9. 

medium length (4 < L < 6), while graphs with 7 < K < 11 
neighbors yield lower errors for 3 < L < 5. Using larger 
values of K and L leads to large errors. This tendency may 
be due to overfitting filters to signals, and demonstrates that 
there exists a trade-off between graph and filter parameters. 

Signal Compression 

Efficient signal representation is required in multiple DSP 
areas, such as storage, compression, and transmission. Some 
widely-used techniques are based on expanding signals into or- 
thonormal bases with the expectation that most information is 
captured with few basis functions. The expansion coefficients 
are calculated using an orthogonal transform. If the transform 
represents a Fourier transform in some model, it means that 
signals are sparse in the frequency domain in this model, i.e., 
they contain only few frequencies. Some widely-used image 
compression standards, such as JPEG and JPEG 2000, use 
orthogonal expansions implemented, respectively, by discrete 
cosine and wavelet transforms |60|. 

As discussed in the previous example, given a signal s on a 
graph G — (V, A), where A reflects similarities between data 
elements, the shifted signal As can be a close approximation 
of s, up to a scalar factor: As « ps. This is illustrated in 



Fig. |3] where p w 1. Hence, s can be effectively expressed as 
a linear combination of a few [generalized] eigenvectors of A. 

Consider the above dataset of temperature measurements. 
The matrix A in ( |29] l is symmetric by construction, hence 
its eigenvectors form an orthonormal basis, and the graph 
Fourier transform matrix F is orthogonal. In this case, we can 
compress each daily update s of = 150 measurements by 
keeping only the C spectrum coefficients dZSl l s„ with largest 
magnitudes. Assuming that |so| > |si| > ... > |s7v_i|, the 
signal reconstructed after compression is 

s = F-i(so,...,sc-i,0,...,Of . (30) 

Fig. |5] shows the average reconstruction errors obtained by 
retaining 1 < C < spectrum coefficients. 

This example also provides interesting insights into the 
temperature distribution pattern in the United States. Consider 
the Fourier basis vector that most frequently (for 217 days out 
of 365) captures most energy of the snapshot s, i.e., yields 
the spectrum coefficient Sq in (|30] |. Fig. |6] shows the vector 
coefficients plotted on the representation graph according to 
the sensors' geographical coordinates, so the graph naturally 
takes the shape of the mainland US. It can be observed that 
this basis vector reflects the relative temperature distribution 
across the US: the south-eastern region is the hottest one, and 
the Great Lakes region is the coldest one f63l. 

Data Classification 

Classification and labeling are important problems in data 
analysis. These problems have traditionally been studied in 
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machine learning ||64ll . ||65]| . Here, we propose a novel data 
classification algorithm by demonstrating that a classifier 
system can be interpreted as a filter on a graph. Thus, the 
construction of an optimal classifier can be viewed and studied 
as the design of an adaptive graph filter. Our algorithm scales 
linearly with the data size N, which makes it an attractive 
alternative to existing classification methods based on neural 
networks and support vector machines. 

Our approach is based on the label propagation ll66l . ||67l , 
which is a simple, yet efficient technique that advances known 
labels from labeled graph nodes along edges to unlabeled 
nodes. Usually this propagation is modeled as a stationary 
discrete-time Markov process [68], and the graph adjacency 
matrix is constructed as a probability transition matrix, i.e., 
A.n,m > for all n,m, and AIat — In, where Iat is a 
column vector of N ones. Initially known labels determine 
the initial probability distribution s. For a binary classification 
problem with only two labels, the resulting labels are deter- 
mined by the distribution s = A^s. If s„ < 1/2, node v„ is 
assigned one label, and otherwise the other. The number P of 
propagations is determined heuristically. 

Our DSPg approach has two major distinctions from the 
original label propagation. First, we do not require A to 
be a stochastic matrix. We only assume that edge weights 
Afc „i > are non-negative and indicate similarity or depen- 
dency between nodes. In this case, nodes with positive labels 
Sn > are assigned to one class, and with negative labels to 
another. Second, instead of propagating labels as in a Markov 
chain, we construct a filter h{A) that produces labels 

s = h{A)s. (31) 

The following example illustrates our approach. Consider 
a set of TV = 1224 political blogs on the Web that we 
wish to classify as "conservative" or "liberal" based on their 
context [69]. Reading and labeling each blog is very time- 
consuming. Instead, we read and label only a few blogs, and 
use these labels to adaptively build a filter h{A) in dSTT i. 

Let signal s contain initially known labels, where "conser- 
vative," "liberal," and unclassified blogs are represented by 
values Sn — +1, —1, and 0, respectively. Also, let signal t 
contain training labels, a subset of known labels from s. Both 
s and t are represented by a graph G = (V, A), where node 
Vn containing the label of the nth blog, and edge A„.,„ — 1 
if and only if there is a hyperlink reference from the nth to 
the mth blog; hence the graph is directed. Observe that the 
discovery of hyperlink references is a fast, easily automated 
task, unlike reading the blogs. An example subgraph for 50 
blogs is shown in Fig. |l(d)| 

Recall that the graph shift A replaces each signal coefficient 
with a weighted combination of its neighbors. In this case, 
processing training labels t with the filter 

iN+hiA (32) 

produces new labels t = t + hiAt. Here, every node label 
is adjusted by a scaled sum of its neighbors' labels. The 
parameter hi can be interpreted as the "confidence" in our 
knowledge of current labels: the higher the confidence hi, the 



Fraction of initially known labels 

Blog selection method 

2% 5% 10% 

Random 87% 93% 95% 

Most hyperlinks 93% 94% 95% 

TABLE I 

Accuracy of blog classification using adaptive filters. 



more neighbors' labels should affect the current labels. We 
restrict the value of hi to be positive. 

Since the sign of each label indicates its class, label i„ is 
incorrect if its sign differs from s„, or i„s„ < for s„ ^ 
0. We determine the optimal value of hi by minimizing the 
total error, given by the number of incorrect and undecided 
labels. This is done in linear time proportional to the number 
of initially known labels s„ ^ 0, since each constraint 

inSn = ltn+ hi ^ ife s„ < (33) 

is a linear inequality constraint on hi. 

To propagate labels to all nodes, we repeatedly feed them 
through P filters ^ of the form h^P^A) = Iat +hpA, each 
time optimizing the value of hp using the greedy approach 
discussed above. The obtained adaptive classification filter is 

h{A) = {In +hpA){lN +hp^iA) ... (Iat +hiA). (34) 

In experiments, we set P ~ 10, since we observed that 
filter ( |34] | converges quickly, and in many cases, hp — 
for p > 10, which is similar to the actual graph's diameter 
of 8. After the filter ( |34] | is constructed, we apply it to all 
known labels s, and classify all N nodes based on the signs 
of resulting labels s — h{A)s. 

In our experiments, we considered two methods for se- 
lecting nodes to be labeled initially: random selection, and 
selection of blogs with most hyperlinks. As Table H] shows, 
our algorithm achieves high accuracy for both methods. In 
particular, assigning initial labels s to only 2% of blogs with 
most hyperlinks leads to the correct classification of 93 % of 
unlabeled blogs. 

Customer Behavior Prediction 

The adaptive filter design discussed in the previous example 
can be applied to other problems as well. Moreover, the linear 
computational cost of the filter design makes the approach 
easily scalable for the analysis of large datasets. Consider 
an example of a mobile service provider that is interested in 
keeping its customers. The company wants to predict which 
users will stop using their services in the near future, and offer 
them incentives for staying with the provider (improved call 
plan, discounted phones, etc.). In particular, based on their 
past behavior, such as number and length of calls within the 
network, the company wants to predict whether customers will 
stop using the services in the next month. 

This problem can be formulated similarly to the previous 
example. In this case, the value at node Vn of the representation 
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Fig. 7. The accuracy of behavior prediction for customers of a mobile 
provider. Predictions for customers who stopped using the provider and those 
who continued are evaluated separately, and then combined into the overall 
accuracy. 



by viewing them as signals indexed by graph nodes. We 
demonstrated that DSPq is a natural extension of the classical 
time-series DSP theory, and traditional definitions of the above 
DSP concepts and structures can be obtained using a graph 
representing discrete time series. We also provided example 
appUcations of DSPg to various social science applications, 
and our experimental results demonstrated the effectiveness 
of using the DSPq framework for datasets of different nature. 
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graph G = (V, A) indicates the probability that the nth 
customer will not use the provider services in the next 30 
days. The weight of a directed edge from node w„ to Vm is 
the fraction of time the ?ith customer called and talked to the 
mth customer; i.e., if T„ ,„ indicates the total time the nth 
customer called and talked to the mth customer in the past 
until the present moment, then 

T 



A — 



The initial input signal s has s„ = 1 if the customer has 
already stopped using the provider, and s„ = otherwise. 
As in the previous example, we design a classifier filter ( |34] |; 
we set P ~ 10. We then process the entire signal s with the 
designed filter obtaining the output signal s of the predicted 
probabilities; we conclude that the nth customer will stop 
using the provider if s„ > 1/2, and will continue if S„ < 1/2. 

In our preliminary experiments, we used a ten-month-long 
call log for approximately 3.5 million customers of a European 
mobile service provider, approximately 10% of whom stopped 
using the provider during this periocH. Fig. |7] shows the 
accuracy of predicting customer behavior for months 3-10 
using filters with at most L < 10 taps. The accuracy reflects 
the ratio of correct predictions for all customers, the ones 
who stop using the service and the ones who continue; it is 
important to correctly identify both classes, so the provider 
can focus on the proper set of customers. As can be seen from 
the results, the designed filters achieve high accuracy in the 
prediction of customer behavior. Unsurprisingly, the prediction 
accuracy increases as more information becomes available, 
since we optimize the filter for month K using cumulative 
information from preceding K — 1 months. 

VII. Conclusions 

We have proposed DSPg, a novel DSP theory for datasets 
whose underlying similarity or dependency relations are rep- 
resented by arbitrary graphs. Our framework extends funda- 
mental DSP structures and concepts, including shift, filters, 
signal and filter spaces, spectral decomposition, spectrum, 
Fourier transform, and frequency response, to such datasets 

*We use a large dataset on Call Detailed Records (CDRs) from a large 
mobile operator in one European country, which we call EURMO for short. 



Appendix A: Matrix Decomposition and Properties 

We review relevant properties of the Jordan normal form, 
and the characteristic and minimal polynomial of a matrix 
A e C^^^; for a thorough review, see 1571. Il58l. 

Jordan Normal Form 

Let Aq, . . . , Xi\i-i denote M < N distinct eigenvalues of 
A. Let each eigenvalue Am have D„i linearly independent 
eigenvectors v™ o, • ■ • , Vm £)„,~i. The Dm is the geometric 
multiplicity of Am- Each eigenvector Vm.d generates a Jordan 
chain of Rm,d > 1 linearly independent generalized eigen- 
vectors 'Vm,d,r, < r < RmM, whcrc Vm,dfi = Vm,d, that 
satisfy 

(A — A„i I)v„i^d,r ~ V„i,ci,r-1- (35) 

For each eigenvector Vm.d and its Jordan chain of length 
Rm d, we define a Jordan block matrix of dimension Rm, d as 



(K 



\ 



1 



1 

Am / 



(36) 



Thus, each eigenvalue Am is associated with Dm Jordan 
blocks, each with dimension RmM < d < D„i- Next, 
for each eigenvector 'Vm,d, we collect its Jordan chain into 
a N X Rm,d matrix 



,,d.O 



Vm,d,_R„ 







(37) 



We concatenate all blocks Vm.d, < d < D„i and < rn < 
M, into one block matrix 



V = V 



0,0 



V 



M-1.Dm- 



(38) 



so that Vm.d is at position J2T=o ^k + d in this matrix. Then, 

(39) 



matrix A has the Jordan decomposition 

A = VJV \ 
where the block-diagonal matrix 

(J-Ro,o(Ao) 

is called the Jordan normal form of A. 



(40) 
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Minimal and Characteristic Polynomials 

The minimal polynomial of matrix A is the monic polyno- 
mial of smallest possible degree that satisfies mA(A) ~ On- 
Let R,n = inax{i?„i.o, • • • , i?m,_D„,-i} denote the maximum 
length of Jordan chains corresponding to eigenvalue A™. Then 
the minimal polynomial mj^{x) is given by 



(41) 



The index of A™ is 1 < m < Af. Any polynomial 

p{x) that satisfies p(A) = Oat, is a polynomial multiple of 
mA{x), i.e., p{x) = q{x)mA.{x). The degree of the minimal 
polynomial satisfies 



j\/-i 



degmA(a;) = = 2^ i?m < N. (42) 

The characteristic polynomial of the matrix A is defined as 

Pa(x) =det(AI-A) = (x-Ao)^°...(a;-AM-i)^"-^ 

(43) 

Here: = i?m,o + . . . + Rrn,D„,-i for < to < M, is 
the algebraic multiplicity of Am,; degpA(a;) = A^; PA(a;) is 
a multiple of TOA(a;); and Pa (a;) = to a (a;) if and only if 
the geometric multiplicity of each A™, D„i = 1, i.e., each 
eigenvalue Am has exactly one eigenvector. 

Appendix B: Proof of Theorem|2] 

We will use the following lemma to prove Theorem |2] 
Lemma 1: For polynomials h{x), g{x), and p{x) — 
h{x)g{x), and a Jordan block Jr{X) as in ( |36] | of arbitrary 
dimension r and eigenvalue A, the following equality holds: 



/^(J,(A)).9(J,(A)) ==p(J,(A)). 
Proof: The (i, j)th element of h{3r{X)) is 



(44) 



(45) 



for j > i and otherwise, where /i'-'^*^(A) is the (j — 
i)th derivative of h{X) ll58l . Hence, the (i,j)th element of 
h{Jr{X))g{3r{X)) for < i is zero and for j > i is 

3 

^ ^(Jr(A))i,fcg(Jr(A))fej 
k=i 

= i:7F^''"-"(-^)77^^'"-"(A) 



(fc-i)! 
1 ' 



k—i 



k — i 
j - I 

TO 



/j(™)(A)gO-»-™)(A) 



1 



-TT(/i(A)5(A)) 



(46) 



Matrix equality ( |44| ) follows by comparing ( |46] l with ( |45T l. ■ 
As before, let Aq, . . . , Ajv/-i denote distinct eigenvalues of 
A. Consider the Jordan decomposition ( [39] l of A. For each 
< TO < M, select distinct numbers Am o, ■ ■ ■ , Am,i5„-i, so 



that all Am,d for < d < and < to < A/ are distinct. 
Construct the block-diagonal matrix 

/J-Ro.q(^0,o) \ 



The Jordan blocks on the diagonal of J match the sizes of the 
Jordan blocks of J in (|40] |. but their elements are different. 

Consider a polynomial r{x) — ro + rix + . . . + rN^ix^^^, 
and assume that r(J) = J. By Lemma [T] this is equivalent to 

^(Am.d) = Am, 

r-(i)(A™,d) = l 

^r-W(Am,d) =0, fov2<i<Dra 

for all < (i < Dm and < m < M. This is a system of 
linear equations with N unknowns vq, . . . , tm-i that can be 
uniquely solved using inverse polynomial interpolation |58|. 

Usmg dsn, we^obtain A = VJV"^ =^Vr(J)V"^ = 
r(V J V^^) — r(A). Furthermore, since all Xm,d distinct 
numbers, their geometric multiplicities are equal to 1. As dis- 
cussed in Appendix A, this is equivalent to Pp^{x) — m^{x). 

Appendix C: Proof of Theorem|4] 

Lemma [T] leads to the construction procedure of the inverse 
polynomial g{x) of h{x), when it exists, and whose matrix 
representation satisfies g{A)h{A) — Ijy. Observe that this 
condition, together with ( l44l l. is equivalent to 



jh{Xm)g{Xrn) =1, for < TO < Af - 1 

j {h{X„,)g{Xm)Y"^ = 0' for 1 < ^ < R,-,,. 



(47) 



Here, i?m is the degree of the factor (a; — Am)^'" in the 
minimal polynomial mA(A) in dTTl l. Since values of h{x) and 
its derivatives at Am are known, ( |47] | amount to A'a linear 
equations with A^a unknowns. They have a unique solution 
if and only if h{Xm) 7^ for all Am, and the coefficients 
5o,...,gMA-i are then uniquely determined using inverse 
polynomial interpolation [58|. 
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